fix: add process-level credential cache#2272
Conversation
📝 WalkthroughWalkthroughAdds a process-level in-memory credential cache keyed by realm and authentication chain. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/auth/manager_chain.go (1)
55-87:⚠️ Potential issue | 🟠 MajorConcurrent misses still fan out authentication.
This is race-safe, but not duplicate-safe. Two goroutines can both miss on Line 56, both run
authenticateFromIndex, and bothStoreon Line 85. Sincedescribe affectedreaches auth from parallel workers, the first wave can still issue duplicate AssumeRole calls for the same realm+chain. Gate the miss path with per-key singleflight/in-flight coordination, then lock it down with a same-key concurrent test.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/auth/manager_chain.go` around lines 55 - 87, The current cache miss path (using chainCacheKey + processCredentialCache) is race-safe but allows duplicate authentication because concurrent goroutines can both miss at the initial Load and both call authenticateFromIndex, causing duplicate AssumeRole calls; fix by adding per-key in-flight coordination (e.g., use singleflight.Group or a per-key mutex map) around the miss path: when Load misses, enter the singleflight call keyed by chainCacheKey, inside the singleflight function re-check processCredentialCache.Load (or call findFirstValidCachedCredentials) to avoid a lost-race, then call authenticateFromIndex only once and Store the resulting processCachedCreds to processCredentialCache before returning; ensure the public path around cache load -> singleflight -> re-check -> authenticateFromIndex -> Store uses chainCacheKey and references processCredentialCache, authenticateFromIndex, findFirstValidCachedCredentials, processCachedCreds and m.chainCacheKey.
🧹 Nitpick comments (1)
pkg/auth/manager_chain.go (1)
36-42: Keep the reset hook out of the production API.This helper exists only for test isolation, and the new tests already live in
package auth, so an unexportedresetProcessCredentialCache()would work here. If you ever need it from external-package tests,export_test.gokeeps that surface test-only instead of making it part of the runtime package API.Based on learnings:
pkg/utils now exposes ResetPathMatchCache() and ResetGlobMatchesCache() (export_test.go).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/auth/manager_chain.go` around lines 36 - 42, The exported ResetProcessCredentialCache helper is test-only and should be hidden from the production API: rename the function ResetProcessCredentialCache to an unexported resetProcessCredentialCache and keep its logic operating on processCredentialCache.Range/Delete; if external-package tests need access, move or duplicate an exported wrapper into an export_test.go test-only file instead of keeping this in the runtime package API.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/auth/manager_chain.go`:
- Around line 45-47: The current chainCacheKey (manager.chainCacheKey) builds a
key by concatenating m.realm.Value and strings.Join(m.chain, "->"), which can
collide when values contain ":" or "->"; replace this with an unambiguous
encoding (e.g., JSON-encode a small struct {Realm string; Chain []string} or
base64 of a deterministic encoding) so realm and chain elements are uniquely
separated; update chainCacheKey to return that encoded string (and handle any
encoding error deterministically) instead of the current simple concatenation.
---
Outside diff comments:
In `@pkg/auth/manager_chain.go`:
- Around line 55-87: The current cache miss path (using chainCacheKey +
processCredentialCache) is race-safe but allows duplicate authentication because
concurrent goroutines can both miss at the initial Load and both call
authenticateFromIndex, causing duplicate AssumeRole calls; fix by adding per-key
in-flight coordination (e.g., use singleflight.Group or a per-key mutex map)
around the miss path: when Load misses, enter the singleflight call keyed by
chainCacheKey, inside the singleflight function re-check
processCredentialCache.Load (or call findFirstValidCachedCredentials) to avoid a
lost-race, then call authenticateFromIndex only once and Store the resulting
processCachedCreds to processCredentialCache before returning; ensure the public
path around cache load -> singleflight -> re-check -> authenticateFromIndex ->
Store uses chainCacheKey and references processCredentialCache,
authenticateFromIndex, findFirstValidCachedCredentials, processCachedCreds and
m.chainCacheKey.
---
Nitpick comments:
In `@pkg/auth/manager_chain.go`:
- Around line 36-42: The exported ResetProcessCredentialCache helper is
test-only and should be hidden from the production API: rename the function
ResetProcessCredentialCache to an unexported resetProcessCredentialCache and
keep its logic operating on processCredentialCache.Range/Delete; if
external-package tests need access, move or duplicate an exported wrapper into
an export_test.go test-only file instead of keeping this in the runtime package
API.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3a309404-eef4-40f2-97f7-3ed46b5a5711
📒 Files selected for processing (2)
pkg/auth/manager_chain.gopkg/auth/manager_chain_process_cache_test.go
721bc26 to
6bafe3f
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/auth/manager_chain.go (1)
51-89: Addperf.TracktoauthenticateChain().The cache hit/miss flow looks good. I’d still add
defer perf.Track(nil, "auth.Manager.authenticateChain")()here so the latency win is measurable and this hot path stays aligned with the repo’s tracking convention.♻️ Proposed change
func (m *manager) authenticateChain(ctx context.Context, _ string) (types.ICredentials, error) { + defer perf.Track(nil, "auth.Manager.authenticateChain")() + // Fast path: check process-level in-memory cache. // Credentials authenticated during this process are guaranteed correct, unlike // keyring/file caches which may hold stale data from previous runs.Based on learnings "performance tracking with
defer perf.Track()is enforced on all functions via linting, including high-frequency utility functions, formatters, and renderers."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/auth/manager_chain.go` around lines 51 - 89, Add performance tracking to authenticateChain by invoking perf.Track at function entry: in the method manager.authenticateChain, immediately after the function begins, add a deferred call using defer perf.Track(nil, "auth.Manager.authenticateChain")() so the function's latency is recorded; ensure the perf package is imported if not already. This keeps the hot path aligned with repo tracking conventions and measures cache hit/miss latency for authenticateChain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/auth/manager_chain.go`:
- Around line 51-89: Add performance tracking to authenticateChain by invoking
perf.Track at function entry: in the method manager.authenticateChain,
immediately after the function begins, add a deferred call using defer
perf.Track(nil, "auth.Manager.authenticateChain")() so the function's latency is
recorded; ensure the perf package is imported if not already. This keeps the hot
path aligned with repo tracking conventions and measures cache hit/miss latency
for authenticateChain.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c20f5b48-166c-4131-86ce-45bbab3fe268
📒 Files selected for processing (2)
pkg/auth/manager_chain.gopkg/auth/manager_chain_process_cache_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/auth/manager_chain_process_cache_test.go
|
Hi! I'm I would like to apply some automated changes to this pull request, but it looks like I don't have the necessary permissions to do so. To get this pull request into a mergeable state, please do one of the following two things:
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2272 +/- ##
==========================================
+ Coverage 77.16% 77.20% +0.03%
==========================================
Files 1034 1034
Lines 97576 97597 +21
==========================================
+ Hits 75296 75346 +50
+ Misses 18073 18046 -27
+ Partials 4207 4205 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
65cbc13 to
d9af71c
Compare
…le calls The previous fix (dbcba35) correctly forces re-authentication of the target identity to prevent stale cached credentials from being returned. However, during commands like `atmos describe affected`, each `!terraform.state` YAML function resolution creates a new AuthManager via resolveAuthManagerForNestedComponent, and each one triggers a full AssumeRole API call since the target identity cache is always skipped. This caused `describe affected` to take ~17 minutes instead of ~2. Add a process-scoped in-memory sync.Map that caches authenticated credentials keyed by realm + chain. Within a single CLI invocation, once a chain has been authenticated, subsequent managers with the same chain reuse the result. Cached entries are validated for expiration before use. Unlike keyring/file caches that persist across processes and may contain stale data, the in-memory cache is inherently fresh and resets when the process exits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d9af71c to
1533c1e
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/auth/manager_chain_process_cache_test.go (1)
47-242: Consider table-driving the scenario tests to reduce duplication.These scenarios share setup shape and differ mainly by realm/chain/cache seed + expected auth call count; a table-driven harness will shrink repetition and make additions safer.
As per coding guidelines, "Use table-driven tests for testing multiple scenarios in Go."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/auth/manager_chain_process_cache_test.go` around lines 47 - 242, These four near-duplicate tests should be combined into a single table-driven test (e.g. TestProcessCredentialCache_TableDriven) that iterates over cases; for each case call resetProcessCredentialCache(), optionally pre-seed processCredentialCache with the case's seed key/value, construct a manager with the case's chain/realm/identities/providers/credentialStore (reusing the existing test helper types testProvider, countingIdentity, testCreds, testStore), call m.authenticateChain(ctx, role), then assert the returned creds and the identity.callCount matches the case's expected count; use t.Run(case.name, func(t *testing.T){ ... }) to isolate cases and require/assert as in the originals. Ensure you reference and reuse authenticateChain, resetProcessCredentialCache, processCredentialCache, manager.chain, manager.realm.Value, and countingIdentity.callCount to implement the checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/auth/manager_chain_process_cache_test.go`:
- Around line 17-46: Remove the manual countingIdentity test double and replace
it with a mock generated via go.uber.org/mock/mockgen: add a //go:generate
directive in the pkg/auth package referencing the types.Identity interface, run
mockgen to produce a mock (e.g., mock_types_identity.go with
mock.NewMockIdentity), then update tests that referenced countingIdentity
(including uses of Authenticate, PrepareEnvironment, GetProviderName, etc.) to
construct the generated mock, configure expectations/Return values for
Authenticate and other methods, and assert call counts via gomock's EXPECT()
instead of countingIdentity.callCount.
In `@pkg/auth/manager_chain.go`:
- Around line 56-58: The current unchecked type assertion
entry.(*processCachedCreds) can panic if a malformed value is in
processCredentialCache; change it to a safe assertion (e.g., cached, ok :=
entry.(*processCachedCreds)) and skip/evict the entry when ok is false before
calling m.isCredentialValid("process-cache", cached.credentials); ensure you
reference processCredentialCache, cacheKey, processCachedCreds, and
m.isCredentialValid when making the check so the code either continues safely or
clears the bad cache entry instead of panicking.
---
Nitpick comments:
In `@pkg/auth/manager_chain_process_cache_test.go`:
- Around line 47-242: These four near-duplicate tests should be combined into a
single table-driven test (e.g. TestProcessCredentialCache_TableDriven) that
iterates over cases; for each case call resetProcessCredentialCache(),
optionally pre-seed processCredentialCache with the case's seed key/value,
construct a manager with the case's
chain/realm/identities/providers/credentialStore (reusing the existing test
helper types testProvider, countingIdentity, testCreds, testStore), call
m.authenticateChain(ctx, role), then assert the returned creds and the
identity.callCount matches the case's expected count; use t.Run(case.name,
func(t *testing.T){ ... }) to isolate cases and require/assert as in the
originals. Ensure you reference and reuse authenticateChain,
resetProcessCredentialCache, processCredentialCache, manager.chain,
manager.realm.Value, and countingIdentity.callCount to implement the checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 41ab2ee5-4ddb-465d-aa28-c2f95557e129
📒 Files selected for processing (2)
pkg/auth/manager_chain.gopkg/auth/manager_chain_process_cache_test.go
|
These changes were released in v1.214.0-rc.2. |
what
sync.Map) toauthenticateChain()that stores successfully authenticated credentials keyed by realm + chain identitywhy
atmos describe affected, each!terraform.stateYAML function resolution creates a newAuthManagerviaresolveAuthManagerForNestedComponent, and each one triggers a full AssumeRole API callatmos describe affectedto degrade from ~2 minutes to ~17 minutes due to N redundant STS AssumeRole calls for N components sharing the same auth chainminCredentialValidityBuffer= 15m) before reuse, and the cache resets naturally when the process exitsreferences
Summary by CodeRabbit
New Features
Tests